skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Liang, Mingyu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Training LLMs in distributed environments presents significant challenges due to the complexity of model execution, deployment systems, and the vast space of configurable strategies. Although various optimization techniques exist, achieving high efficiency in practice remains difficult. Accurate performance models that effectively characterize and predict a model’s behavior are essential for guiding optimization efforts and system-level studies. We propose Lumos, a trace-driven performance modeling and estimation toolkit for large-scale LLM training, designed to accurately capture and predict the execution behaviors of modern LLMs. We evaluate Lumos on a production ML cluster with up to 512 NVIDIA H100 GPUs using various GPT-3 variants, demonstrating that it can replay execution time with an average error of just 3.3%, along with other runtime details, across different models and configurations. Additionally, we validate its ability to estimate performance for new setups from existing traces, facilitating efficient exploration of model and deployment configurations. 
    more » « less
    Free, publicly-accessible full text available May 12, 2026
  2. The min-hash sketch is a well-known technique for low-communication approximation of the Jaccard index between two input sets. Moreover, there is a folklore belief that min-hash sketch-based protocols protect the privacy of the inputs. In this paper, we consider variants of private min-hash sketch based-protocols and investigate this folklore to quantify the privacy of the min-hash sketch. We begin our investigation by presenting a highly-efficient two-party protocol for estimating the Jaccard index while ensuring differential privacy. This protocol adds Laplacian noise to the min-hash sketch counts to provide privacy protection. Then, we aim to understand what privacy, if any, is guaranteed if the results of the min-hash are released without any additional noise, such as in the case of historical data. We begin our investigation by considering the privacy of min-hash in a centralized setting where the hash functions are chosen by the min-hash functionality and are unknown to the participants. We show that in this case the min-hash output satisfies the standard definition of differential privacy (DP) without any additional noise. We next consider a more practical distributed setting, where the hash function must be shared among all parties and is typically public. Unfortunately, we show that in this public hash function setting, the min-hash output is no longer DP. We therefore consider the notion of distributional differential privacy (DDP) introduced by Bassily et al. (FOCS 2013). We show that if the honest party's set has sufficiently high min-entropy, the min-hash output achieves DDP without requiring noise. Our findings provide guidance on how to use the min-hash sketch for private Jaccard index estimation and clarify the extent to which min-hash protocols protect input privacy, refining the common belief in their privacy guarantees. 
    more » « less
    Free, publicly-accessible full text available January 13, 2026
  3. Free, publicly-accessible full text available January 9, 2026
  4. null (Ed.)
    We propose a new theoretical approach for building anonymous mixing mechanisms for cryptocurrencies. Rather than requiring a fully uniform permutation during mixing, we relax the requirement, insisting only that neighboring permutations are similarly likely. This is defined formally by borrowing from the definition of differential privacy. This relaxed privacy definition allows us to greatly reduce the amount of interaction and computation in the mixing protocol. Our construction achieves O(n * polylog(n)) computation time for mixing n addresses, whereas all other mixing schemes require O(n^2) total computation across all parties. Additionally, we support a smooth tolerance of fail-stop adversaries and do not require any trusted setup. We analyze the security of our generic protocol under the UC framework, and under a stand-alone, game-based definition. We finally describe an instantiation using ring signatures and confidential transactions. 
    more » « less
  5. Abstract We propose a new theoretical approach for building anonymous mixing mechanisms for cryptocurrencies. Rather than requiring a fully uniform permutation during mixing, we relax the requirement, insisting only that neighboring permutations are similarly likely. This is defined formally by borrowing from the definition of differential privacy. This relaxed privacy definition allows us to greatly reduce the amount of interaction and computation in the mixing protocol. Our construction achieves O ( n· polylog( n )) computation time for mixing n addresses, whereas all other mixing schemes require O ( n 2 ) total computation across all parties. Additionally, we support a smooth tolerance of fail-stop adversaries and do not require any trusted setup. We analyze the security of our generic protocol under the UC framework, and under a stand-alone, game-based definition. We finally describe an instantiation using ring signatures and confidential transactions. 
    more » « less
  6. null (Ed.)